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THE  TOTAL  VARIATION  REGULARIZED  MODEL  FOR 
MULTISCALE  DECOMPOSITION 

WOTAO  YIN*,  DONALD  GOLDFARBt,  AND  STANLEY  OSHER* 

Abstract.  This  paper  studies  the  total  variation  regularization  model  with  an  fidelity  term 
(TV-L^)  for  decomposing  an  image  into  features  of  different  scales.  We  first  show  that  the  images 
produced  by  this  model  can  be  formed  from  the  minimizers  of  a  sequence  of  decoupled  geometry 
subproblems.  Using  this  result  we  show  that  the  TV-L^  model  is  able  to  separate  image  features 
according  to  their  scales,  where  the  scale  is  analytically  defined  by  the  G-value.  A  number  of  other 
properties  including  the  geometric  and  morphological  invariance  of  the  TV-L^  model  are  also  proved 
and  their  applications  discussed. 

Key  words,  multiscale  image  decomposition,  total  variation,  distance,  feature  selection, 
geometry  optimization. 
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1.  Introduction.  Let  a  grey-scale  n-dimensional  image  be  represented  by  a 
function  /  on  a  domain  of  In  this  paper,  we  restrict  our  discussion  to  typi¬ 
cal  2-dimensional  open  domains  (typically,  or  (0,1)^).  The  TV-L^  model 

obtains  a  decomposition  of  /  by  solving  the  following  model: 

(1.1)  inf  rV(u)  +  A||/-u||ii, 

u£BV 

for  the  minimizer  u\  and  v\  =  f  —  u\,  where  BV  is  the  space  of  functions  of  bounded 
variation,  TV{u)  is  the  total  variation  of  u,  and  /  G  The  latter  is  needed 

for  technical  reasons  given  below.  Previous  work  on  this  model  for  image/signal  pro¬ 
cessing  includes  Alliney’s  pioneering  study  [3,  4,  5]  of  the  discrete  version  of  (1.1), 
Nikolova’s  [34,  35,  36]  discovery  of  the  usefulness  of  this  model  for  removing  impul¬ 
sive  noise,  Chan  and  Esedoglu’s  [17]  further  analysis  of  this  model,  and  a  series  of 
applications  of  this  model  in  computer  vision  by  Chen  et  al.  [22,  20,  21]  and  in 
biomedical  imaging  Yin  et  al.  [44].  Related  work  that  was  carried  out  at  the  same 
time  as  or  after  our  work  includes  Kawohl  and  Schuricht’s  [29]  investigation  of  the 
1-Laplace  operator  div(Dt6/|DM|),  Darbon’s  [23]  analysis  of  the  discrete  TV-L^  model 
as  a  contrast  invariant  filter,  Allard’s  [2]  analysis  of  the  total  variation  regularization 
using  a  geometric  measure  theory  approach,  and  Vixie  and  Esedoglu’s  [43]  work  on 
characterizing  the  solutions  of  (1.2)  below. 

In  this  paper  we  extend  the  existing  analysis  of  the  TV-L^  model.  In  particular, 
we  show  its  equivalence  to  the  non-convex  geometry  problem: 

(1.2)  ininPer([/) -I- A  |5'A[/|, 
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where  Per([/)  is  the  perimeter  of  the  set  U,  SAU  =  {S\U)  U  {U\S)  is  the  symmetric 
difference  of  the  sets  S  and  U,  and  |S'A[/|  is  its  area.  We  use  this  equivalence  to 
obtain  results  on  TV-L^’s  scale-based  feature  selection  properties.  We  first  give  a 
brief  introduction  to  the  framework  of  total  variation-based  image  processing  models 
and  the  space  of  functions  of  bounded  variation. 

1.1.  Notation.  We  use  following  notation  throughout  this  paper  (the  TV  and 
Per  functionals  are  defined  in  the  next  section): 

1.  TheTV-Li  energy:  E{u-,  \,  f)  =  TV{u)  +  X\\u  -  fUi; 

2.  The  minimizer  of  E(w;  A,  /):  u*,  which  may  also  be  subscripted  by  A; 

3.  The  TV-T^  operator:  Tt  ■  f  denotes  a  minimizer  u*  of  E(m;  1/t, /)  (if  there 
are  multiple  minimizers,  Tt  •  /  gives  a  specific  u*); 

4.  Composite  operator:  TgoTff  =  Ts-{Tff)  denotes  a  minimizer  of  E(u;  1/s,  {Tg- 

/)); 

5.  The  super  level  set:  L{f,fj,)  =  {a;  G  dom(/)  :  f{x)  >  ^}; 

6.  The  geometry  energy:  Eg([/;  A,  S')  =  Per({7) -I- A  I^AC/I; 

7.  The  minimizer  of  Eg(C/;A,S):  U*,  which  may  also  be  subscripted  by  the 
penalty  parameter  A  or  the  level  parameter  /i,  when  S  =  L{f,  /i),  to  emphasize 
the  dependence  on  A  or  /i,  respectively; 

8.  For  two  sets  S  and  T,  S  +  T  :=  SUT  when  SHT  =  0  is  (implicitly)  assumed; 

9.  For  two  sets  S  and  T,  S  —  T  :=  S\T  when  S  3  T  is  (implicitly)  assumed. 

1.2.  Total  variation  (TV)  models.  A  general  framework  for  obtaining  a  de¬ 
composition  of  an  image  /  into  a  regular  part  u  and  an  irregular  part  v  is  to  solve 
the  problem 

(1.3)  inf  {||s(m)|U  I  \\t{u,f)\\B<(7}, 

U 

for  u,  where  s(-)  and  t(-,  •)  are  two  functionals  on  appropriate  spaces  and  ||  •  m  and 
II  •  ||b  are  norms  (or  semi-norms).  ||  •  ||a  and  s(-)  should  be  chosen  so  that  ||s(w)m 
is  small  for  regular  signals  u  but  much  bigger  for  irregular  noise  v.  Then,  minimizing 
||s(t6)m  is  equivalent  to  regularizing  u  according  to  the  measure  ||s(u)||yi-  A  typical 
choice  for  ||s(M)||yi  is  J\Du\p,  where  u  G  BV,  the  space  of  functions  of  bounded 
variation  (see  Def.  1.1),  and  Du  denotes  the  generalized  derivative  of  u.  For  p  >  1, 
minimizing  f  |iAu|P  tends  to  produce  rather  smooth  functions.  In  particular,  p  =  2 
gives  Tihoknov  regularization.  Therefore,  to  keep  edges  like  object  boundaries  in  u 
(i.e.  to  allow  discontinuities  in  u),  one  should  use  p  =  1.  An  adaptive  combination  of 
these  semi-norms  can  be  used  to  keep  sharp  edges  while  avoiding  staircase  effects  in 
regions  where  the  image  varies  smoothly.  The  fidelity  term  \\t(u,  /)||b  <  cr  forces  u  to 
be  close  to  /.  t{u,  /)  is  often  chosen  to  be  /  —  m  =  v.  The  choice  of  a  particular  norm 
depends  on  the  application.  In  image  denoising,  a  common  choice  (known  as  the  ROF 
model)  is  \\t{u,f)\\B  =  \\f  —  mUl^,  which  is  small  if  /  —  u  is  noise.  The  ROF  model 
[39]  by  Rudin,  Osher,  and  Fatemi  was  the  first  use  of  total  variation  regularization  in 
image  processing.  In  deblurring  with  denoising,  for  example,  ||t(u,  /)||b  =  \\f-Au\\L2 
is  commonly  used,  where  A  is  a  blurring  operator.  The  pioneering  ROF  model  led 
the  way  to  a  rich  area  of  total  variation-based  image  processing.  See  [18]  for  a  survey. 

If  ||s(w)m  and  ||t(w,  /)||b  are  convex  in  u,  the  constrained  minimization  problem 
min{||s(t6)m  s.t.  ||t(M)||B  <  a}  is  equivalent  to  its  Lagrangian  form  min{||s(M)m  -I- 
A||t(M, /)||b},  where  A  is  the  Lagrange  multiplier  for  the  constraint  ||t(u)||B  <  cr.  The 
two  problems  have  the  same  solution  if  A  is  chosen  equal  to  the  optimal  value  of  the 
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dual  variable  corresponding  to  the  constraint  in  the  constrained  problem.  Given  cr  or 
A,  one  can  calculate  the  other  value  by  solving  the  corresponding  problem. 

1.3.  The  BV  and  G  spaces  and  norms.  We  now  formally  define  the  Banach 
space  BV  of  functions  with  bounded  variation  and  the  Banach  space  G,  which  is  dual 
to  a  subspace  of  BV,  and  norms  defined  on  these  spaces.  We  provide  these  for  the 
completeness.  They  are  relevant  to  some  of  the  results  in  Section  7.  Readers  familiar 
with  the  theoretical  foundation  of  total  variation  [7]  can  skip  this  subsection. 

Definition  1.1.  [4.6]  Let  u  €  ,  and  define 

TV{u)  :=  sup  /  f  udiv{g)dx  :  5  €  C'q(M";  M"),  |g(a;)|;2  <  1 /or  aZZ  a;  €  M" 


and  ||u||bi/  :=  ||m||li  +TV{u),  where  C'q(IR";IR")  denotes  the  set  of  continuously 
differentiable  vector-valued  functions  that  vanish  at  infinity.  The  Banach  space  of 
functions  with  hounded  variation  is  defined  as 

BV  =  {u  e  :  ||u||bi/  <  00}  , 

and  is  equipped  with  the  ||  •  \\BV-norm.  Moreover,  TV{u)  is  a  semi-norm.  BV{Ll) 
with  Q,  being  a  bounded  open  domain  is  defined  analogously  to  BV  with  and 
replaced  by  L^(n)  and  CQ(n;M”),  respectively. 

If  u  is  in  the  smaller  Sobolev  space  and  Du  is  the  generalized  derivative 

of  u,  then  we  have  from  the  definition  above,  using  integration  by  parts: 

(1.4)  ueBVnW^’^iW^)^  sup  ( Du-g<oo. 

geCo(R";R")  Is(a:)|j2<l  j 


We  can  also  see  from  (1.4)  that  each  u  defines  a  bounded  linear  functional  Lu{g) 
on  170(1^";®")  [12].  Using  the  Riesz  representation  theorem  (also  referred  to  as  the 
Riesz-Markov  theorem)  on  the  isomorphism  between  the  dual  of  Co(M”;IR”)  and  the 
set  of  bounded  vector  Radon  measures,  we  immediately  have  the  following  equivalent 
and  often  used  definition: 


BV  =  {u  :  Du  is  a  bounded  Radon  vector  measure  on  M"}  . 

When  Du  is  considered  as  a  measure,  TV(u)  over  a  set  fl  C  M"  equals  the  total 
variation  of  Du  as  the  Borel  positive  measure  over  12.  This  is  given  by 

{n  n 

ElT“T.)LU  Ei  C  12,  Efs  are  disjoint  Borel  sets 

i=l  i=l 

where  the  Borel  sets  are  the  cr-algebra  generated  by  the  open  sets  in  M".  This  is 
true  because  each  g  S  (70(1^”;®")  such  that  II5II/2  <  1  is  the  limit  of  a  series  of 
[— 1,  l]-valued  vector  functions  that  are  piecewise  constant  on  Borel  sets. 

In  the  dual  space  of  (7o(K";  M")  we  define  weak-*  convergence  of  Dun  to  Dn  as 


lim 

n— ^00 


Dun  ■  g  = 


Du  ■  g, 


for  all  ge  (7o(M";M"). 
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Sets  in  M”  with  finite  perimeter  are  often  referred  to  as  BV  sets.  The  perimeter 
of  a  set  S  is  defined  as  follows: 

(1.5) 

Per(S')  :=TP(ls)  =sup|y  div{g)dx  :  g  G  K"),  |g(a;)|,2  <  1,V  xG  M” 

where  Is  is  the  indicator  function  of  S. 

Next,  we  define  the  space  G  [32]. 

Definition  1.2.  Let  G  denote  the  Banach  space  consisting  of  all  generalized 
functions  v{x)  defined  on  M"  that  can  be  written  as 

(1.6)  T  =  div(g),  5=[5,]i^i,..„„GL“(M";K"), 

and  equipped  with  the  norm  ||t||g  defined  as  the  infimum  of  all  L°°  norms  of  the  func¬ 
tions  \g{x)\i2  over  all  decompositions  (1-6)  ofv.  In  short,  ||t||g  =  inf{||  \g(x)\i2  jj^oo  : 

V  =  div(g)}.  G  is  the  dual  of  the  closed  subspace  BV  of  BV,  where  BV  :=  {u  G 
BV  :  \Du\  G  L^}  [32].  We  note  that  finite  difference  approximations  to  functions  in 
BV  and  in  BV  are  the  same.  For  the  definition  and  properties  of  G{Vt),  see  [11]. 

An  immediate  consequence  of  Definition  1.2  is  that 

(1.7)  J  uv  =  J  uV  ■g  =  -  J  Du-g<  UDm]]  IJuUg, 

holds  for  any  u  G  BV  with  compact  support  and  v  G  G.  We  say  (m,  v)  is  an  extremal 
pair  if  (1.7)  holds  with  equality.  This  result  was  recently  used  by  Kindermann,  Osher 
and  Xu  [30]  to  recover  /  from  an  ROF  generated  v,  which  may  have  applications  in 
image  encryption. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2  we  present  a  simple  ex¬ 
ample  to  introduce  some  preliminaries  and  existing  results  on  the  TV-L^  model.  Sec¬ 
tion  3  is  devoted  to  the  monoticity  property  of  the  geometry  problem  min[/^R2  Per([/)-|- 
A|S'A[/|,  which  serves  as  a  basis  for  the  rest  of  the  paper.  In  Section  4  we  use  the 
results  in  Section  3  to  construct  the  solution  of  the  TV-L^  model  and  discuss  compu¬ 
tational  methods  based  on  this  construction.  Sections  5  and  6  discuss  feature  selection 
and  geometric  and  morphological  invariance  properties  of  the  TV-L^  model.  In  Sec¬ 
tion  7  we  take  a  different  analytic  approach  to  establish  the  relationship  between  an 
approximate  and  the  exact  TV-L^  model.  Numerical  results  illustrating  properties  of 
the  model  are  given  in  Section  8.  Some  technical  results  are  given  in  Appendices  A 
and  B. 

2.  Preliminary  properties  of  the  TV-T^  model.  Let  us  first  see  how  the 

TV-L^  model  (1.1)  acts  on  the  simple  2-dimensional  image 


/  —  c1b,.(o)) 


which  has  intensity  c  >  0  at  all  points  (xi,X2)  inside  the  disk  Br{0)  =  {(xi,X2)  : 
Xi  X2  <  r^}  and  intensity  0  everywhere  else. 

Chan  and  Esedoglu  [17]  showed  that  for  this  /,  solving  (1.1)  gives 

[0,  ifO<A<f, 

U\=  I  c'f  for  any  c'  G  [0, 1],  if  A  =  f , 

[/,  ifA>f. 
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In  general  the  minimizer  of  the  TV-L^  is  nonunique.  In  the  above  disk  example,  if 
A  =  2/r,  problem  (1.1)  has  an  infinite  number  of  minimizers. 

Two  surprising  points  about  this  example  are  worth  mentioning.  First,  the  signal 
is  never  corrupted  by  the  TV-L^  decomposition:  for  all  values  of  A  except  2/r,  where 
nonuniqueness  occurs,  the  entire  signal  of  /  is  either  completely  contained  in  or 
completely  contained  in  This  is  not  case  in  the  ROF  (TV-L^)  decomposition. 
Strong  and  Chan  [42]  showed  that,  for  this  /,  the  ROF  model  gives  u\  =  c\f,  where 
c/  is  a  constant  scalar  lying  in  [0, 1),  never  reaching  1  exactly.  In  other  words,  even  if 
the  input  image  /  is  completely  noiseless,  there  does  not  exist  a  value  of  A  that  gives 
an  uncorrupted  output  in  the  ROF  model.  Meyer  [32]  characterized  this  phenomenon 
using  the  G-norm:  if  ||/||g  >  then  the  of  the  ROF  model  satisfies  ||r^||G  = 
otherwise,  if  |1/|1g  <  the  decomposition  is  meaningless  since  u\  =  0  and  uj  =  /. 
Since  corresponds  to  signal  noise  in  the  ROF  model,  Meyer’s  analysis  indicates 
that  the  ROF  model  corrupts  a  noiseless  input.  However,  it  is  desirable  that  noise- 
free  images  or  image  portions  be  kept  invariant  by  a  signal-noise  decomposition.  To 
achieve  this,  a  remedy  proposed  by  Osher  et  al  in  [37]  for  the  ROF  model  iterates  the 
decomposition  using  fk  =  f  +  Vk-i-  In  contrast  to  the  ROF  model,  the  TV-L^  model 
keeps  the  integrity  of  the  signal  in  its  decomposition. 

The  second  surprising  property  of  the  TV-L^  model  exposed  by  this  example  is 
that  the  value  of  A  at  which  u*^  switches  from  0  to  /  depends  on  the  size  of  signal  (i.e., 
the  radius  of  the  disk) ,  and  not  on  its  intensity  c.  This  is  very  surprising  because  the 
fidelity  term  ]]/  —  u\\]^i  clearly  penalizes  the  intensity  difference  between  /  and  u* . 
For  the  aforementioned  disk  signal  [17]  or  inputs  like  an  annulus  or  a  set  of  concentric 
annuli,  and  more  general  inputs  with  level  sets  contained  inside  a  convex  outer  curve, 
one  may  derive  analytic  solutions  using  a  symmetry  argument  (e.g.,  see  Appendix  A) 
or  total  variation  flow  (see  [6,  13,  14,  9,  10]  for  extensive  studies  of  the  minimizers 
of  minu^^5  Per([/)  —  A|[/|  when  S  is  convex  or  has  an  external  boundary  that  is  a 
convex  curve  using  total  variation  flow);  but  how  should  one  choose  A  for  an  image 
that  may  contain  signals  of  different  scales  and  arbitrary  shapes  so  as  to  isolate  in 
u\  certain  of  these  signals?  In  Sections  5  and  6,  we  give  a  rigorous  proof  of  this 
intensity-independent  property  of  the  TV-L^  model  for  general  image  inputs.  This 
property  was  observed  by  Chan  and  Esedoglu  and  mentioned  in  their  work  [17]. 

Even  though  Alliney  and  Nikolova  did  not  explicitly  draw  these  conclusions  in  an 
analytic  way  in  their  papers  [3,  4,  5,  34,  35],  they  made  related  observations,  and  their 
successful  attempts  of  applying  the  1D/2D  TV-L^  to  signal  processing  were  based  on 
these  properties.  Alliney  studied  the  ID  and  discrete  version  of  the  TV-L^  energy 
and  proved  that  his  recursive  median  filter  can  construct  u\  directly.  Many  of  his 
ID  results  were  later  extended  to  2D  or  higher  dimensional  spaces  in  [17].  Nikolova 
focused  on  the  minimization  of  non-differentiable  data  fidelity  terms,  including  the 
V"  fidelity  term,  and  presented  impressive  and  successful  applications  of  the  TV-L^ 
model  to  impulsive  noise  removal  and  outlier  identification.  She  observed  that  u* ,  the 
reconstructed  image,  was  exact  at  some  pixels  and  related  this  finding  to  the  property 
of  contrast-invariance.  Later,  Chan  and  Esedoglu  [17]  compared  the  continuum  of 
ROF  and  TV-L^  energies  and  studied  the  geometric  properties  of  u\.  Some  of  their 
results  are  quoted  below. 

Proposition  2.1.  The  TV-L^  energy  of  a  function  f  can  be  expressed  as  an 
integral  of  the  geometry  energies  of  the  super  level  sets  of  f;  i.e., 

pOO 

E('u;A,/)=  /  EG(L(w,M);A,L(/,Ai))d/r. 


(2.1) 


—  OO 
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Formula  (2.1),  which  is  called  the  Layer  Cake  formula  in  [17],  can  be  obtained  by 
combining  the  co-area  formula  [26]  /  jVuj  =  /_  Per(L(t6, /i))d/r  with  the  formula 
/  \u-  f\dx  =  \L{u,^i)AL{f,  fL)\d^i. 

Proposition  2.2.  If  the  observed  image  is  a  characteristic  function  of  a  set 
S,  i.e.,  f  =  Is,  and  =  min„  E(m;  A,  Ig),  then  for  any  yc  G  [0,1)  also 

minimizes  E(m;  A,  Ig). 

An  equivalent  of  the  above  proposition  is 

Proposition  2.3.  Any  [0,1)  level  set  of  the  minimizer  u\  o/E(m;A,  Is)  solves 
the  geometry  problem 

(2.2)  min  Eg (C/;  A,  A). 


Propositions  2.2  and  2.3  state  that  the  non-convex  geometry  problem  (2.2)  can  be 
converted  into  the  TV-L^  problem  (1.1),  which  is  a  convex  problem.  This  geometry 
problem  finds  applications  in  removing  isolated  binary  noise  and  enhancing  binary 
fax  images.  Given  a  binary  signal  S,  one  can  solve  the  TV-L^  problem  with  input 
/  =  Is.  If  the  solution  u*  is  not  binary,  then  one  should  examine  its  level  sets. 

In  the  rest  of  the  paper,  we  exploit  the  converse  of  this  process:  given  an  observed 
image  /,  one  can  solve  a  series  of  geometry  problems  (2.2)  and  use  the  series  of 
solutions,  which  are  sets,  to  construct  u*  explicitly. 

3.  The  TV-L^  geometry  problem.  In  [17]  Chan  and  Esedoglu  raised  the 
following  question  about  the  geometry  problem  (2.2): 

If  Si  C  S2  and  Uf  and  U2  are  minimizers  of  the  geometry  problem 
(2.2)  with  inputs  S  =  Si  and  S  =  S2,  respectively,  is  Uf  C 
While  the  absolute  answer  is  “no” ,  the  answer  is  affirmative  for  a  variant  of  the  above 
question: 

Theorem  3.1.  Suppose  that  Si  C  S2  and  Uf  and  Uf  are  minimizers  of  the 
geometry  problem  (2.2)  with  inputs  S  =  Si  and  S  =  S2, 

1.  if  either  U(  or  [/J  is  a  unique  minimizer,  then  U(  C  GJ ; 

2.  otherwise,  i.e.,  both  problems  with  input  S  =  Si  and  S  =  S2  have  multiple 
solutions,  there  exists  a  solution  pair  (C/f,[/|)  such  that  U(  C  Uf. 

This  section  focuses  on  proving  Theorem  3.1,  which  is  used  in  the  next  section  to 
construct  a  solution  u*  of  the  TV-T^  problem  from  a  series  of  Uf’s  from  input  sets 
S  =  L{f,y)  for  all  values  of  y.. 

Before  we  present  this  proof,  let  us  see  why  the  answer  to  Chan  and  Esedoglu’s 
question  is  negative: 

Example  Let  A  =  2/r  and  the  input  sets  Si  =  Br(0)  and  S2  =  Br{0)  U  Br{x)  where 
a;  is  a  point  distant  from  the  origin  0.  Clearly,  Si  C  S2  strictly.  However,  there  exist 
solutions 


Uf  =  Brio)  and  Uif  =  0 

of  the  geometry  problem  (2.2)  for  S  =  Si  and  S  =  S2,  respectively,  where  U(  D  U2 
strictly.  According  to  the  results  in  [13,  14],  both  problems  have  multiple  solu¬ 
tions.  For  S  =  ^i,  the  set  of  minimizers  is  {0, 73^(0)}  and,  for  S  =  S2,  this  set 
is  {0,  Br{0),  Br{x),  Br{0)  U  Br{x)}. 

Assumption  A:  In  the  rest  of  this  section,  we  assumer  U(  and  [/|  are  minimizers 
([/f  C  U2  may  not  hold)  of  EG(t7;A,  S')  with  S  =  Si  and  S  =  S2  for  a  fixed  A, 
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respectively,  and  Si  C  82-  Since  A  is  fixed,  we  omit  A  and  write  Eg(C/;A,  S')  as 
Eg(C7;  S).  Moreover,  we  define  =  U*  H  C/|  and  U\/  =  U*  U  C/|. 

Lemma  3.2  (Proposition  3.38  in  [7]).  For  two  arbitrary  sets  Ui  and  U2  with  finite 
perimeters,  we  have 


(3.1)  Per{Ui)  +  Per([/2)  >  Per(C/i  n  C/2)  +  Per(C/i  U  C/2) 


This  property  is  also  called  the  submodularity  of  Per  functional.  The  following 
example  shows  that  the  above  inequality  can  hold  strictly. 

Example  Let  Ui  be  a  square  with  opposite  corners  at  (0,0),  (—1, 1)  and  C/2  be  an¬ 
other  square  with  opposite  corners  at  (0,  0),  (1, 1).  Ui  has  its  entire  right  edge  touch¬ 
ing  the  entire  left  edge  of  C/2.  According  to  the  definition,  Per(C/i)  =  Per(C/2)  =  4, 
Per(C/i  n  C/2)  =  0,  and  Per(C/i  U  C/2)  =  6,  where  the  third  equation  holds  because 
Ui  n  c/2  has  measure  0  in  and  hence  is  untestable  by  continuous  functions. 

In  general,  if  two  sets  Ui  and  C/2  share  opposite  edges  for  a  strictly  positive  reduced 
length  (a  concept  in  geometric  measure  theory,  see  [7]),  then  this  length  is  excluded 
from  Per(C/i  n  C/2)  so  (3.1)  holds  strictly.  Lemma  3.2  is  used  in  the  proof  of  Lemma 
3.4  below. 

Lemma  3.3.  Under  Assumption  A,  the  following  inequalities  hold: 


(3.2)  0  >  Eg(C/*;  Si)  -  Eg(C/a;  ^1)  >  EaiUf,  S2)  -  Eg(C/a;  ^2). 


Proof.  The  first  inequality  follows  from  the  optimality  of  Uf .  To  prove  the  second 
inequality  in  (3.2),  we  expand  its  left-hand  and  right-hand  sides: 


(3.3) 

Eg(C/i*;5i)-Eg(C/a;^i) 

=  Per(C/i )  -  Per(C/i*  n  U^) 

(3.4) 

+  x{\uf\Si\-\{Ufnu;)\Si\) 

(3.5) 

+  \{\Si\un-\Si\{u*inum 

(3.6) 

EG(C/r;52)-EG(C/A;^2) 

=  Per(C/i‘)  -  Per(C/i*  n  C/^) 

(3.7) 

-b  a(|c/A^2|  -  l(c/i*  n  c/2*)\^2|) 

(3.8) 

+  A(|A\c/r|-|A\(c/rnc/AI) 

YIN,  GOLDFARB,  AND  OSHER 


As  (3.3)  is  identical  to  (3.6),  we  only  need  to  prove  (3.4)  >  (3.7)  and  (3.5)  >  (3.8). 
In  fact, 

\u^\s^\-m:nu^)\s^\ 

=  |(t/A(c/rn[/2*))\5i| 

=  \{u*\u;)\s,\ 

>I(A*\A*)\A|  (-.-Aca) 

=  |([/A(A*n[/2*))\A| 

=  Ar\A|-|(A*nc/*)\^2|, 

and 

|AWri-|A\(A*nc/AI 
=  |AnA1-|An(A*nc/|)l 
=  -|A  n  ((A*  n  A*)  -  W)l  (•.■  Vf  c  (A*  n  u^)) 

>  -A2n((A*nc/A-W)l  (•.'Ac  A) 

=  lAncAl  -  |An(A*n  A*)l 
=  |AWri-|A\(A*nA*)l- 


Lemma  3.4.  Under  Assumption  A,  the  following  inequalities  hold: 


(3.9)  Eg(A*;  A)  -  Eg(C/a;  A)  >  Eg(A;  A)  -  Eg{u;;  A)  >  o. 


Proof.  The  second  inequality  follows  directly  from  the  optimality  of  C/^  with  re¬ 
spect  to  S'  =  82.  To  prove  the  first  inequality,  let  us  expand  Eg(C7v;  A)  — Eg(C/|;  S2). 
(The  other  term  Eg(C/i  ;  S2)  —  EG(f7A;  S2)  was  expanded  above  in  the  proof  of  Lemma 
3.3.) 

Eg(c/C;A)-Eg(c/2*;A) 

(3.10)  =  Per(A  U  A*)  “  Per(A) 

(3.11)  +  A(|(A  U  A)\AI  -  lAAAl) 

(3.12)  +  A(|S2\(A  U  A)l  -  IA\A*I)- 

We  need  to  prove  that  (3.6),  (3.7),  and  (3.8)  are  no  less  than  (3.10),  (3.11),  and  (3.12), 
respectively.  Lemma  3.2  gives  (3.6)>(3.10).  Moreover,  we  have  (3.7)  =  (3.11)  and 
(3.8)  =  (3.12)  as  proved  below: 

IAAAI  -  |(A  n  A)\A|  =  |((A  n  A)  +  (A\A))\A|  -  |(A  n  A)\A| 

=  l(A\A)\A|  +  |(A  n  A)\A|  -  |(A  n  A)\A|  (•.'  (A  n  A)  n  (A\A)  =  0) 

=  |(A\A)\AI  + 1 A\AI  -  IA\AI 

=  l(A  +  (A\A))\A|  -  I A\A|  =  |(A  u  A)\A|  -  I A\A|  (•.■  (A\A)  n  A  =  0)- 
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and 


\S2\un  -  \S2\{u^  n  c/2*) I  =  1^2  n  cTfi  -  1^2  n  {u*  n  u*)\ 

=  -\S2n{{u*nu*)-u!)\-,  (•.■  u*  c  (u*  n  u*) ) 

=  -1^2  n  (C/At/2*) I;  (••■  t/r\c/2*  =  c/r  -  c/r  n  c/2*  =  (c/f  n  c/2*)  -  c7*  ) 
=  -|An(WW_)^  _ 

=  -lA  n  (c/2*  -  (c/2*  n  C/AI 

=  lA  n  (C/2*  u  c/*)|  -  lA  n  u*\;  (•.■  c/j  d  c/jn  A*  =  (A*  u  U*) ) 

=  |An(A2*  +  A*\A2*)|-|AncA| 

=  |A\(c/2*  +  A*\c/2*)I  -  IA\c/2*|  =  A2\(A*  u  c/2*)|  -  \S2\U^\. 


□ 

Proof,  (of  Theorem  3.1)  Concatenating  the  inequalities  in  Lemmas  3.3  and  3.4 


gives  us 

0  > 

EG(Af;Si)-EG(A;Si) 

(3.13) 

> 

EG(Af;S2)-EG(A;S2) 

> 

Eg(A;A)-Eg(A2*;S2) 

> 

0. 

Therefore,  all  inequalities  above  hold  as  equalities.  It  follows  that  C/a  and  C/v  minimize 
Eg(A;5'i)  and  Eg(C/;S'2),  respectively.  We  have  found  a  solution  pair  •= 

(C/a,  a)  such  that  C  Aj  □ 

Corollary  3.5.  Under  Assumption  A,  Aa  minimizes  'Eg^U;  Si) ;  Ay  minimizes 
Eg(AA). 

The  following  two  corollaries  extend  the  above  geometric  results  to  the  minimizers 
of  the  TV-A^  energy  E. 

Corollary  3.6.  Let  u*  and  minimizes  the  TV-L^  energies  E(u;A, /i)  and 
E(u;A,  A),  respectively,  where  fi  >  fi  point-wise.  We  have 

1.  if  either  u\  or  uj  *5  a  unique  minimizer,  then  u*  < 

2.  otherwise,  i.e.,  both  problems  with  inputs  f  =  fi  and  /  =  A  have  nonunique 

solutions,  then  there  exists  a  solution  pair  such  that  u\  <  u^. 

Corollary  3.7.  Under  the  same  assumption  as  in  the  above  corollary,  min(M*,  rt^) 
and  m.ax{ul,U2)  minimize  E(m;  A,  /i)  and  E(m;  A,  A),  respectively. 

The  above  two  corollaries  can  be  proved  by  first  using  the  Layer  Cake  formula 
(2.1)  to  express  the  TV-A^  energy  as  an  integral  of  geometry  energies  over  all  super 
level  sets  and  then  applying  the  results  of  Theorem  3.1  and  Corollary  3.5  to  the 
geometry  energy  at  each  level.  We  leave  these  proofs  to  the  reader. 

4.  Constructing  u*  from  U*.  In  this  section  we  show  how  Theorem  3.1  can 
be  applied  to  construct  a  minimizer  u*  of  E(m;  A,  /)  from  minimizers  of  Eg(A;  A,  S')- 
Given  an  observed  image  /,  we  let  the  inputs  S  be  the  level  sets  of  /,  i.e.,  S  =  A(/,  qi), 
which  are  contained  inside  one  another  as  p,  is  increased,  i.e.,  L{f,pL2)  Q  -b(/, /ii)  if 
A2  >  Ai-  From  Theorem  3.1,  the  following  collections  of  sequences  of  minimizers  A* 
of  Eg(A;  a,  A(/,  fi))  is  well-defined  and  nonempty: 


(4.1)  U*(/)  : 


j.  V/r  G  M,  A*  is  a  RiiniRiizer  of  Eg(A;  A,  A(/,  ^)) 

lAAeR:  andAACA*AA2>Ai- 
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Let  G  U*(/),  i.e.,  {u;  }^gR  is  a  specific  sequence  of  sets  in  the  collection 

![)*(/),  in  which  the  minimizers  U*  of  Eig{U;  X,  L{f,  n))  are  contained  inside  one 
another  as  fx  is  increased.  We  define  u  pointwise  by 


(4.2)  u{x)  =  sup{^  :  X  G  U*}. 

A  geometrically  clearer  version  of  the  above  construction  is  the  following:  let 


then  we  have 
(4.3) 


/  =  /+-/■,  (/+,/■  >0) 

{[/+U«  G  U*(/+)  and  {C/-U«  G  U*(/-), 


1A+  U~ 


where  u'^  and  u~  are  built  up  by  stacking  the  shrinking  (and  nested)  U^’s  and  U~’s. 
One  can  easily  check  that  (4.2)  and  (4.3)  are  equivalent.  Darbon  and  Sigelle  [24] 
independently  obtained  a  similar  result  for  their  discretized  version  of  a  variant  of  the 
TV-L^  problem,  which  is  a  linear  program. 

Below  is  the  main  result  of  this  section  based  on  the  above  construction. 

Theorem  4.1.  The  function  u{x)  given  by  (4.2)  minimizes  E(u;A,/)  where 
{u;}  is  any  sequence  from  the  collection  of  sequences  U*(/)  given  by  (4.1). 

Proof.  By  the  construction  formula  (4.2),  the  level  set  of  u 


(4.4)  L{u,  fx)  =  {x  :  u{x)  >  /i}  =  {x  :  3/x,  5  x  G  U*  and  g.  >  fx} 

Since  f7*  C  [7?  if  ^  >  Jx,  the  condition  of  the  existence  of  a  /i  that  gives  both  x  G  U* 
and  IX  >  fx  in  the  right-hand  side  of  (4.4)  holds  if  and  only  ii  x  G  U^.  It  follows  that 

(4.5)  L{u,ix)  =  U;. 

Since  each  L(u,  ix)  minimizes  the  integrand  of  the  integral 

/OO 

EG(C/;A,L(/,M))dM 

-OO 

over  all  levels  /r,  the  whole  integral  is  minimized  and  is  equivalent  to  E(m;  A,  /)  by  the 
Layer  Cake  formula  (2.1).  □ 

In  Section  5  below.  Theorem  4.1  is  used  to  characterize  those  image  features  that 
appear  in  u*  and  those  that  end  up  in  u*  =  /  —  m*  . 

In  the  light  of  this  theorem,  one  may  wonder  whether  this  equivalence  between 
the  TV-L^  model  (1.1)  and  a  series  of  geometry  problems  makes  it  easier  (or  at  least 
provides  a  more  geometric  way)  to  find  a  solution.  There  is  both  bad  and  good  news. 
In  theory,  the  number  of  different  geometry  problems  we  need  to  solve  is  infinite 
while,  in  practice,  when  processing  computerized  images  this  number  is  finite  and  is 
limited  by  the  number  of  grey-scale  levels  (typically  256  or  2^®).  However,  obtaining 
a  solution  of  (2.2)  is  non-trivial  because  it  is  nonconvex  and  has  nonunique  solutions 
in  general.  Let  us  again  examine  the  disk  example  S  =  Br(0)  with  radius  r  for 
an  illustration  of  solution  nonuniqueness:  if  S'  =  5^(0)  and  A  =  2/r,  then  both 
U  =  Br{0)  and  U  =  0  minimize  the  geometry  energy  while  any  other  sets,  especially 
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those  U'  =  Bf{0)  satisfying  0  <  f  <  r,  are  not  minimizers;  if  0  <  A  <  2/r,  then 
U  =  Br{0)  is  a  local,  non-global,  minimizer  in  then  sense  that  U  =  Br±e{0)  for  any  e 
small  gives  higher  energies  (so  U  =  Br{0)  is  locally  minimal)  but  C/  =  0  is  the  unique 
global  minimum.  This  suggests  that  a  global  minimization  algorithm  for  solving  (2.2) 
may  have  to  examine  a  large  number  of  sets  before  restricting  its  search  locally. 

A  recent  algorithm  that  falls  into  this  category  is  one  proposed  by  Darbon  and 
Sigelle  [24]  for  solving  the  anisotropic  TV-L^  model,  where  the  1-norm  of  the  gradient 
of  u  is  used  in  the  the  definition  of  TV (u)  in  place  of  the  2-norm.  This  algorithm 
is  based  on  sampling  a  256-level  Markov  random  field  (MRF)  on  grey-scale  images. 
It  uses  the  Layer  Cake  formula  and  associates  a  binary  MRF  with  each  level  set  of 
an  image  by  reformulating  and  decomposing  the  geometry  energy  (2.2)  as  conditional 
posterior  energies  at  each  pixel  (i.e.,  at  each  site  in  the  MRF).  To  optimize  each  binary 
MRF,  i.e.,  to  find  a  lowest  energy  configuration,  they  used  a  min-cut  algorithm  [31] 
that  finds  the  global  minimizer  in  polynomial  time.  This  is  not  surprising  despite  the 
fact  that  the  geometry  problem  may  have  strictly  local  solutions.  Although  the  min- 
cut  problem  is  a  0-1  combinatorial  problem,  its  dual  is  the  max  flow  problem,  which 
is  well  known  to  be  polynomial-time  solvable.  This  link  between  min-cut  problems 
and  TV-L^  has  the  promise  of  providing  extremely  efficient  algorithms  for  the  latter. 

After  our  work  was  first  submitted,  an  anonymous  referee  brought  to  our  attention 
the  recent  work  [16]  by  A.  Chambolle,  which  exploits  the  connection  between  the 
anisotropic  ROF  model  and  a  series  of  decoupled  geometry  problems  to  solve  the 
ROF  model  using  a  min-cut  based  algorithm  that  is  essentially  identical  to  the  one 
proposed  in  [24]. 

5.  Feature  selection.  In  the  previous  sections,  the  parameter  A  in  the  TV-L^ 
model  (1.1)  was  fixed.  In  this  section,  however,  we  vary  A  and  relate  it  to  the  scales 
of  the  features  in  u*  and  v*.  It  is  well  know  that  Meyer’s  G-norm  (Def.  1.2)  is  a  good 
measure  of  the  oscillation  of  functions  [32].  Using  G- value  [40]  defined  below,  which 
is  an  extension  of  Meyer’s  G-norm  proposed  by  Scherzer  et  al,  we  are  able  to  fully 
characterize  v*  for  a  given  parameter  A.  To  emphasize  the  role  of  A  in  determining 
u*,  V*,  U*,  and  V*,  we  add  A  as  a  subscript  to  these  quantities  (i.e.,  we  write  u*^, 
and  in  this  section. 

In  general,  the  TV-L^  model,  using  a  particular  value  of  the  parameter  A,  returns 
an  image  combining  many  features.  In  certain  applications  one  is  interested  in  ex¬ 
tracting  small  and/or  large-scale  features  in  an  image.  Therefore,  we  are  interested 
in  determining  a  A  that  gives  all  targeted  features  with  the  least  number  of  unwanted 
features  in  the  output.  Below  we  show  how  to  choose  an  appropriate  A  that  will  allow 
us  to  extract  geometric  features  of  a  given  scale,  measured  by  the  G-values  of  their 
level  sets. 

Definition  5.1.  (G-value) [40]  Let  ik  :  ^  2'®  &e  a  set-valued  function  (also 

called  a  multifunction  and  a  set-valued  map)  that  is  measurable  in  the  sense  that 
4'”^(S')  is  Lehesgue  measurable  for  every  open  set  S'  C  K.  We  do  not  distinguish  be¬ 
tween  Ik  being  a  set-valued  function  and  a  set  of  measurable  (single-valued)  functions, 
and  let 


vk  :=  {■)/  :  t[  is  measurable  and  t[{x)  G  'k(a;),V  x}. 

The  G-value  of  ik  is  defined  as  follows: 


sup  inf 


(5.1) 


G(vk)  : 


i[{x)h{x)dx. 


12 


YIN,  GOLDFARB,  AND  OSHER 


An  easy  way  to  understand  the  above  definition  is  to  compare  the  definitions  of 
G-value  and  G-norm.  Since  the  G-norm  of  a  function  ^|)  can  be  defined  as 


(5.2)  G({V'})  =  sup  [  il;{x)h{x)dx  =  \\tp\\G, 

heCl^:f  |V/i|  =  l  j 

where  a  single-function  set  {ip}  replaces  'h  in  (5.1),  the  G-value  can  be  viewed  as  an 
extension  of  the  G-norm  to  set-valued  functions.  In  [40]  Scherzer  et  al  applied  the 
G-value  to  the  sub-differential  of  the  absolute  value  of  /,  d\f\,  and  the  Slope  from  [8] 
of  /  to  determine  when  or  vanishes: 

Theorem  5.2.  [40]  Let  d\f\  denote  the  set-valued  suh- differential  o/|/|,  i.e., 


[[-1,1]  f[x)  =  0. 


Then,  for  the  TV-L^  problem  (1.1) 

1.  =  0  (TJ  =  f)  is  an  optimal  solution  if  and  only  if 

2.  u\  =  f  (v’l  =  0)  is  an  optimal  solution  if  and  only  if 


A  > 


\\Df\\-\\Dhy 
heBV  J\f-h\ 


Instead  of  directly  applying  Theorem  5.2  to  the  input  /,  we  apply  it  to  the 
characteristic  functions  of  the  level  sets  of  /.  We  easily  have  the  following  results  as 
a  corollary  of  Theorem  5.2: 

Corollary  5.3.  For  the  geometric  problem  (2.2)  with  a  given  X, 

1.  1/^  =  0  (Vf  =  S)  is  an  optimal  solution  if  and  only  if 


G(9|l5|)’ 


2.  =  S  (Vf  =%)  is  an  optimal  solution  if  and  only  if 


A  > 


II  - 

h(^BV  J  jls  —  h\ 


The  above  corollary  characterizes  ^  in  (4.2)  for  given  level  p,  and  scalar  A.  Sup¬ 
pose  that  the  set  S'  of  a  geometric  feature  coincides  with  L{f,  p)  for  p  G  [po,  pi).  Then, 
for  any  A  <  1/G(9|ls|),  S  is  not  observable  in  u*^.  This  is  because  1/G(9|li(y_^)|) 
is  increasing  in  p  and  therefore,  for  p  >  po,  vanishes.  On  the  other  hand,  once 
A  >  1/G(9|ls|),  according  the  above  corollary,  ![/*  ^  0  for  ^  G  [po,pi),  which 

implies  that  at  least  some  part  of  S  can  be  observed  in  u*^.  If  A  is  increased  further 
such  that  A  >  sup^jg^ydlZHs]]  -  \\Dh\\)/ ^\ls  -  h\,  we  get  =  L{f,p)  =  S  for 
all  p  G  [po,  pi)  and  therefore,  the  feature  is  fully  contained  in  which  is  given  by 
(4.2).  In  general,  although  a  feature  is  often  different  from  its  vicinity  in  intensity, 
it  cannot  monopolize  a  level  set  of  the  input  /,  i.e.,  it  is  represented  by  an  isolated 
set  in  L{f,p),  for  some  p,  which  also  contains  sets  representing  other  features.  Con¬ 
sequently,  that  contains  a  targeted  feature  may  also  contain  many  other  features. 
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However,  from  Theorem  4.1  and  Corollary  5.3,  we  can  easily  see  that  the  arguments 
for  the  case  S  =  L{f,  /r)  still  hold  for  the  case  S  C  L{f,  /i). 

Suppose  there  is  a  sequence  of  features  in  /  that  are  represented  by  sets  Si,  S2,  ■  ■  ■ ,  Si 
and  have  distinct  intensity  values.  Let 


(5.4) 


Ar"  := 


G(a|isJ)’ 


^max  . _ 


=  sup 
hGBV 


\DlsA\  -  \\Dh\\ 


for  z  =  1, . . . , /.  If  the  features  have  decreasing  scales  and,  in  addition,  the  following 
holds 


(5.5) 


^min  ^  ^max 


< 


<  . . .  <  A)"'" 


^  \  max 

S  , 


then  feature  z,  for  z  =  1, . . . ,  Z,  can  be  precisely  retrieved  as  zz^max_|_g  —  u^min_g  (here  e 

is  a  small  scalar  that  forces  unique  solutions  because  A”""  =  A™^  is  allowed) .  This  is 
true  since  for  A  =  A““  —  e,  feature  z  completely  vanishes  in  u*^,  but  for  A  =  +  e, 

feature  z  is  fully  contained  in  zz^  while  there  is  no  change  to  any  other  features. 

6.  Geometric  and  morphological  invariance.  Based  on  the  results  and  dis¬ 
cussion  in  the  previous  sections,  we  now  show  that  the  TV-L^  model  is  invariant  under 
certain  geometric  and  morphological  transformations. 

Proposition  6.1.  (Geometric  invariance)  The  TV-L^  decomposition,  defined  as 
the  operator  Tt,  is  invariant  under  the  following  geometric  transformations: 

1.  Translation  t  (shifting  the  coordinates  by  a  constant):  Tt  o  t  =  t  o  Tt; 

2.  Isometry  transformation  R:  Tt  o  R  =  R  o  Tt; 

3.  Scaling  p  >  0.'  Ttp  o  p  =  p  oTt,  where  p  •  x  =  px. 

Proof.  The  proof  is  simple  since  it  is  sufficient  to  provide  a  proof  for  the  geometry 
problem  (2.2).  The  geometry  formulation  (2.2)  has  the  straightforward  properties  of 
translational  and  isometric  (e.g.,  rotational)  invariance.  If  the  rz-dimensional  geometry 
sets  U  and  S  in  (2.2)  are  both  uniformly  scaled  by  a  constant  p,  then  Per([/^)  and 
equal  p”“^Per([/)  and  p”|Z7AS'|,  respectively.  One  can  scale  A  in  (2.2)  by 
1/p  and  obtain  an  energy  homogeneously  scaled  by  p"“^: 

p”-i  (Per(C/)  -L  A|[/A5|)  =  Per(C/'’)  -L  A/p|C/'’AS''’|. 


By  noticing  l/(tp)  =  A/p,  therefore,  Ttp  o  p  •  /  =  {u*)p  =  p  o  Tj  •  /  for  any  /.  □ 

Proposition  6.2.  (Morphological  invariance)  Let  C  he  a  constant  scalar  and  g 
be  an  increasing  real  function.  Then, 


(6.1)  Tf{f  +  C)  =  {Tff)  +  C,  Ttog  =  goTt. 


Proof.  These  results  are  simple  consequences  of  Theorem  4.1,  which  states  that 
minimizing  E(t6;  A,  /)  can  be  decoupled  into  independent  minimizations  of  'EcrfU;  A,  L{F,  p)) 
over  a  range  of  values  of  p.  □ 

Since  log  is  an  increasing  function,  we  have  the  following  from  Proposition  6.2: 
Example  Suppose  /  >  1  strictly,  then  Tt  ■  log  /  =  log(rt  •  /)  =  logzz*.  Let  /  =  log  / 
and  u  =  Tf  f ,  then 

(6.2)  —  =  =  e^-^. 

u* 

This  means  that  the  following  two  processes  are  equivalent: 

(1)  Take  /,  obtain  u*  by  applying  the  TV-L^  decomposition  to  /,  and  get  v'  = 
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Fig.  6.1.  The  LTV  model.  From  left  to  right:  f,  u* ,  v* ,  /(20, :),  =  fju*.  /(20, :)  is  the 

signal  along  the  line  depicted  in  f . 


(2)  Take  the  logarithm  /  of  /,  obtain  v  =  f  —  uhy  applying  the  TV-L^  decomposition 
to  /,  and  get  the  same  v'  =  exp(w). 

This  example  was  explored  by  Chen  et  al  [21]  who  proposed  the  Logarithm  TV  (LTV) 
model  for  preprocessing  face  images  to  correct  varying  illuminations  prior  to  auto¬ 
mated  face  recognition.  In  a  face  image,  one  half  of  the  face  can  look  brighter  than 
the  other  half  if  the  light  shines  on  the  face  from  one  side.  An  extreme  case  occurs 
when  a  point  light  source  is  located  exactly  to  the  left  of  the  face.  In  such  a  case  the 
right  half  of  the  face  only  receives  a  very  little  amount  of  ambient  light,  resulting  in 
face  images  with  very  unbalanced  brightness  and  contrast.  In  Figure  6.1  we  present 
a  face  /  to  illustrate  this  issue.  Geometric  information  intrinsic  to  the  face  must  be 
extracted  for  use  by  distance-based  algorithms  for  comparing  an  inquiry  image  with 
reference  images.  It  is  well  known  in  signal  processing  that  the  logarithm  function, 
which  is  steep  near  1  and  flat  near  oo,  can  be  applied  to  a  grey-scale  image  /  to  enhance 
the  contrast  of  its  low-light  range  signal,  corresponding  to  the  range  of  small  intensity 
values.  Therefore,  the  authors  applied  the  TV-L^  model  to  log  /  and  examined  the 
small-scale  output  v  and  its  restored  signal  exp('!;).  They  found  that  v'  =  exp(z;) 
(Figure  6.2)  contains  signal  of  small-scale  facial  features  that  does  not  vary  too  much 
among  the  images  of  the  same  face  under  different  illuminations.  Their  experiments 
based  on  angle  testing  and  principal  component  analysis  (PCA)  then  proved  that 
v'  =  exp(u)  has  illumination-free  signal  that  is  an  ideal  element  for  distance  measure. 
Specifically,  they  treat  two  exp(u)’s  as  two  vectors  wi  and  W2  and  measure  their  dis¬ 
tance  by  the  cosine  of  the  angle  between  them  (i.e.,  (wi,  W2)/(|| wi ||  ||u;2||)).  Their 
analysis  in  [21]  provides  an  explanation  for  LTV’s  excellent  performance  that  is  based 
on  the  relationship  between  v'  =  f  /u*  and  a  multiplicative  light  reflection  model. 

7.  Smooth  approximation  of  the  TV-L^  model.  In  contrast  with  the  pre¬ 
vious  sections  where  we  followed  a  geometric  approach,  in  this  section  we  analytically 
study  the  approximate  TV-L^  energy 

(7.1)  E,(u;A,/)=  [  [Vul  +  A  [  ^{f-uy  +  s, 

Jn  Jn 

defined  in  a  bounded  convex  open  set  with  a  boundary  (typically  rectangular). 
We  let  rt*  denote  the  unique  minimizer  of  Ee(u;  A,  /)  and  v*  =  f  —  u^.  Because  the 
energy  is  convex  but  nonsmooth,  PDE-based  iterative  methods  [39,  15]  inevitably 
employ  a  smoothing  regularization  like  (7.1).  For  large-scale  and  nondecomposible 
problems,  such  as  those  arise  in  processing  3D  and  4D  medical  images,  this  type  of 
method  is  the  only  one  that  does  not  exceed  memory  limits.  Therefore,  it  is  important 
to  understand  the  behavior  of  the  approximation  (7.1).  Below  we  characterize  the  G- 
norm  of  the  minimizers  of  (7.1),  which  allows  us  to  compare  (7.1)  with  the  ROF  model 
[39]  and  view  the  results  in  the  previous  sections  from  a  different  perspective.  We 
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Fig.  6.2.  The  LTV  model.  First  3  rows:  input  face  images  f ;  Last  3  rows:  illumination-free 
output  images  v'  =  f  /u*  for  automated  face  recognition. 


note  that  the  TV  term  is  also  not  smooth  and  its  approximation  is  discussed  in  [1]. 

In  our  proof  below  of  Theorem  7.3  of  the  convergence  of  the  solutions  of  the  per¬ 
turbed  TV-L^  model  (7.1)  to  the  solution  of  the  TV-L^  model,  we  need  the  following 
result. 

Lemma  7.1  (General  Minimax  Theorem  [25,  41]).  Let  K  he  a  compact  convex 
subset  of  a  Hausdorff  topological  vector  space  X,  C  he  a  convex  subset  of  a  vector 
space  Y ,  and  f  be  a  real-valued  functional  defined  on  K  x  C  which  is  (1)  convex  and 
lower-semicontinuous  in  x  for  each  y,  and  (2)  concave  in  y  for  each  x.  Then 

inf  sup/(a:,j/)  =  sup  inf  /(a;,?/). 

XeK  y^C 


We  also  need  the  following  technical  lemma. 

Lemma  7.2.  The  sets  Gq  :=  {v  :  v  =  div(g),  g  G  (70(11;®"),  ||  \g{x)\i2  jj^cx,  <  1  } 
and  BVq  '■=  {u  G  L^{Ll)  :  \\Du\\  <  i?,  ||u||ii  <  ||/||li}  C  BV{Ll),  where  R  is  given, 
are  convex.  Moreover,  BVq  is  compact  in  L^. 

Proof.  Suppose  Vg  and  Vh  are  in  Gq.  There  exist  g,h  &  (7o(n;  ®")  satisfying 


Vg  =  div{g),  Vh  =  dW{h),  \\\g\i2  \\l--  <  1,  ||  I^Ig  |U~  <  1. 
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For  any  A  €  [0, 1],  we  have  (Minkowski  inequality) 

II  1-^5  +  (1  ~  A)(i|;2  II^cx,  <  II  X\g\i2  +  (1  —  \)\h\i2  H^cx, 

<A||  |5|p  |U<»  +  (1-A)|||%  |U<» 

<A+(1-A)  =  1. 

This  means  ||Aug  +  (1  —  A)u;i||g  <  1;  consequently,  Xvg  +  (1  —  X)vh  G  Go,  he.,  Go 
is  convex.  The  convexity  of  BVq  can  be  proved  analogously  from  its  definition.  The 
compactness  of  BVq  in  follows  from  the  more  general  result  that  any  BF-bounded 
set  is  relatively  compact  in  for  1  <  p  <  n/(n  —  1)  [1,  7,  26].  □ 

Theorem  7.3.  The  solution  u*  G  BV{Tl)  of  the  approximate  TV-L^  model  (7.1) 
satisfies 


l|sign£«)||G  <  1/A, 

where  signg(.)  is  defined  pointwise  by  sign^ ((/) (x)  :=  g{x)/ y^\g{x)\‘^  +  e  for  any  func¬ 
tion  g. 

A  proof  for  more  general  cases  can  be  found  in  [38] .  We  give  a  short  proof  below 
based  on  Lemma  7.1.  A  similar  approach  is  also  used  in  [28]  to  derive  the  G-norm 
related  properties  for  the  ROF  model  [39]. 

Proof.  Let  R  (in  the  definition  of  BVq  in  Lemma  7.2)  be  large  enough,  and 
consider  the  functional  L  :  BVq  x  Gq  ^  M.: 


Lg{u,w)  =  /  uw  +  X\/ (/  —  ufi  +  £. 

Ja 

Define  Pe{u)  =  suPmeGo  Ge(u,  w)  and  Dfiw)  =  inf„gBVo  Le{u,w).  Lfiu,w)  is  convex 
and  lower  semi-continuous  in  u,  and  is  linear  (hence  concave)  in  w. 

Since  Go  is  complete  w.r.t.  ||  •  ||g,  there  exists  an  optimal  w*{u)  satisfying  Pe{u)  = 
L^{u,w){u))  for  each  m  G  BVq.  On  the  other  hand,  by  applying  the  property  of 
(relatively)  weak  compactness  of  BVq  in  for  1  <  p  <  n/(n  —  1)  [1,  7,  26,  38,  46], 
we  have  an  optimal  u)  G  BVq  that  minimizes  Pe{u). 

The  obtainability  of  optimizers  and  Lemma  7.2  allow  us  to  apply  Lemma  7.1  to 
Lfiu,w):  there  exists  an  optimal  solution  pair  {ul,w*)  G  BVq  x  Gq  such  that 

(7.2)  Dfiw))  =  Lfiu*,w))  =  Pfiu*). 


The  first  equation  in  (7.2)  indicates  dLg{u,w)) /du\u=u*  =  0,  and  this  gives 


(7.3) 


w*  =  X 


vl^  P  s 


where  v)  =  f  —  u).  Therefore,  ||sign,,(u/)||G  <  1/A.  □ 

Corollary  7.4.  // ||sign,,(/)||G  <  1/A,  u\^e  =  0  minimizes  Ee(u;A, /). 
Proof  Let  w*  =  Xsign^{f)  and  v*  =  f.  Noting  that  u*  =  0,  we  have 


Dfiw))  =  LM,0  =  PeK)- 


The  corollary  then  follows  from  the  optimality  of  the  saddle  point  {u).,w)).  □ 

Corollary  7.5.  If  ||sign,,(/)||G  >  1/A,  then  there  exists  an  optimal  solution  u* 
satisfying 

•  l|signe«)||G  =  1/A; 
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•  f  u*  signg(t!j)  =  ||_DuJ||/A,  i.e.,  u*  and  signg(vj)  form  an  extremal  pair. 
Proof.  Since  ||signg(/) ||g  >  1/A  but  ||signg(ti/)||G  <  1/A  by  Theorem  7.3,  we 
must  have  u/  ^  0.  Then,  we  have  Hic/Hg  =  1  from  the  second  equation  in  (7.2).  It 
follows  from  (7.3)  that  ||signg(u/)||G  =  1/A.  The  second  result  of  Corollary  7.5  follows 
from  the  equations  u*iu*  =  sup^  eGo  =  ^signg«).  □ 

According  to  Theorem  7.3  and  its  corollaries,  the  approximate  TV-L^  model 
performs  a  soft  thresholding  on  ||signg(/)||G.  If  this  value  is  bigger  than  1/A,  a  part 
of  the  signal  /,  v*  with  ||signg(w/)||G  =  1/A,  is  removed  from  /;  if  less  or  equal  to 
1  /A,  the  whole  signal  v*  =  f  is  removed.  This  does  not  contradict  the  behavior  of  the 
exact  TV-L^  model.  As  e  ^  0,  signg(/)  converges  to  the  characteristic  function  of  the 
support  of  /.  Therefore,  the  thresholding  depends  mainly  on  the  shape  of  /  rather 
than  its  value.  The  smaller  that  e  is,  the  less  the  value  /  affects  the  thresholding. 
This  matches  the  results  based  on  the  G-value  in  Section  5.  We  point  out  that  the 
need  for  the  G-value  in  the  analysis  of  the  exact  model,  as  an  extension  of  the  G-norm 
to  set-valued  functions,  is  related  to  the  fact  that  signg(-)  does  not  converge  to  the 
signum  function  sign(-)  uniformly.  Instead,  we  can  think  that  signg(-)  “converges”  to 


sign  :  M  I— >  2®,  sign(x) 


{sign(a;)},  if  x  0; 
[— 1, 1],  if  X  =  0. 


Finally,  we  show  that  the  solution  of  the  approximate  model  converges  in  to  that 
of  the  exact  model. 

Theorem  7.6.  Assume  u*  minimizes  Ee  (m;  a,  /) .  There  exists  a  minimizer  u  of 
E(u;  A,  /)  such  that 


lim  IK  -  u\\l^  =  0,  lim  |K  -  v||z,i  =  0. 

ej.0+  £i0+ 

Proof.  Noticing  that  \/t  +  e  <  Vi  +  for  t,£  >  0,  we  have,  for  all  positive  s 
less  than  a  given  ep  and  any  minimizer  u*  of  E(u;  A,  /), 

(7.4)  E,«;  A,  /)  <  E,(u*;  A,  /)  <  E(u*;  A,  /)  +  K. 

From  this  we  conclude  that  Ee('u*),  and  thus  ||iAM*||,  are  bounded.  Since  fl  is  bounded 
(/  has  compact  support),  there  exists  u  G  BV{fl)  such  that  limi^oo  —  u\\li  =  0 
with  limi^oo  Si  =  0.  The  optimality  of  u  follows  from 

E{u;X,f)  =  \\Du\\+X  /l«-/l 
=  \\Du\\  +  X  lim  f  J (u*.  —  /)2  -I-  Si  (dominant  convergence) 


<  lim  inf  IjiAu*.  ||  -b  A  /  V  {u*.  —  f)"^  +  Si  (lower  semi-continuity) 

i — »-oo  *  J  *  * 

=  liminf  Ee.(u*.;  A,  /) 

<E(u*;A,/)  (by  (7.4)). 

Since  f  &  and  hence  v  =  f  —  u  G  L^,  we  also  have  lim£^o+  ||tJ  —  v\\m  =  0.  □ 
Unfortunately,  an  error  estimate  independent  of  /  is  not  available.  We  tested 
||u*  —  11111,1  for  different  /’s  and  obtained  different  orders  of  magnitudes  of  e. 
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Table  8.1 

The  G-values  (i.e.,  G(|91s.  |) )  and  of  feature  components  Si, ,  S5; 

Ai , . . . ,  Ag  used  to  obtain  ui, . . .  ,uq. 


S2 

^3 

S4 

^5 

G-value 

19.39390 

13.39629 

7.958856 

4.570322 

2.345214 

Amin 

0.0515626 

0.0746475 

0.125646 

0.218803 

0.426400 

Ai  = 

A2  = 

A3  = 

A4  = 

As  =  As  = 

0.0515 

0.0746 

0.1256 

0.2188 

0.4263  0.6000 

8.  Numerical  results.  In  this  section,  we  present  numerical  results  for  the  TV- 
model  on  multiscale  feature  selection.  Since  the  second-order  cone  programming 
(SOCP)  approach  [27,  45]  has  proven  to  give  very  accurate  solutions  for  solving  TV- 
based  image  models,  we  formulated  the  TV-L^  model  (1.1)  and  the  G-value  formula 
(5.1)  as  SOCPs  and  solved  them  using  the  commercial  optimization  package  Mosek 
[33]. 

The  set  of  results  depicted  in  Figure  8.2  were  obtained  by  applying  the  TV-L^ 
model  with  different  A’s  to  the  composite  input  image  depicted  in  Figure  8.1  (/).  Each 
of  the  five  components  in  this  composite  image  is  depicted  in  Figure  8.1  (S'i)-(S'5). 
They  are  the  image  features  that  we  are  interested  in  extracting  from  /.  We  name 
the  components  by  Si, ...  ,5*5  in  the  order  they  are  depicted  in  Figure  8.1.  Clearly, 
they  are  decreasing  in  scale.  This  is  further  shown  by  their  decreasing  G-values  (i.e., 
G(j9lsJ)),  and  hence,  their  increasing  values  (see  (5.4)),  which  are  given  in 
Table  8.1.  We  note  that  for  i  =  1,...,6,  are  large  since  the  components  do 

not  possess  smooth  edges  in  the  pixelized  images.  This  means  that  the  property 
(5.5)  does  not  hold  for  these  components,  so  using  the  lambda  values  Ai, . . . ,  Ag  given 
in  Table  8.1  does  not  necessarily  give  the  entire  feature  signals  in  the  outputs  Ui, 
i  =  1, . . . ,  6.  We  can  see  from  the  numerical  results  depicted  in  Figure  8.2  that  we 
were  able  to  produce  an  output  Ui  that  contains  only  those  features  with  scales  larger 
that  1/Ai,  leaving  in  Vi  only  a  small  amount  of  the  signal  of  these  features  near  non¬ 
smooth  edges.  For  example,  we  can  see  the  white  boundary  of  S2  in  U3  and  four 
white  pixels  corresponding  to  the  four  corners  of  S'3  in  U4  and  U5.  This  is  due  to  the 
nonsmoothness  of  the  boundary  and  the  use  of  finite  differences.  However,  we  can 
see  that  the  numerical  results  closely  match  the  analytic  results  given  in  Subsection 
4.1.  Mi’s  contain  signal  increasing  in  scale  and  Uj’s  contain  the  residual,  which  is 
decreasing  in  scale.  Using  the  Af“”  values,  we  were  able  to  get  the  desired  features 
in  u  and  v.  Moreover,  by  forming  differences  between  the  outputs  ui, . . .  ,uq,  we 
extracted  individual  features  Si,. ..,  S5  from  input  /.  These  results  are  depicted  in 
the  last  two  rows  of  images  in  Figure  8.2. 

Besides  multiscale  feature  selection  demonstrated  in  the  test  above,  the  TV-T^ 
decomposition  can  also  be  used  to  filter  ID  signal  [3],  to  remove  impulsive  (salt-n- 
pepper)  noise  [35],  to  extract  textures  from  natural  images  [45],  to  remove  varying 
illumination  in  face  images  for  face  recognition  [22,  21],  to  decompose  2D/3D  images 
for  multiscale  MR  image  registration  [20],  to  assess  damage  from  satellite  imagery  [19], 
and  to  remove  inhomogeneous  background  from  cDNA  microarray  and  digital  micro¬ 
scopic  images  [44].  These  interesting  results  were  obtained  before  their  theoretical 
basis  was  proven  above.  We  believe  there  exist  broader  and  undiscovered  applications 
of  the  TV-L^  model  or  variants  of  it,  and  we  hope  that  the  insights  into  the  TV-L^ 
model  provided  here  help  in  identifying  such  applications. 
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(S4)  (S5)  if) 

Fig.  8.1.  (Si)-(S5):  individual  feature  components  of  composite  image  (f). 


Acknowledgement.  The  authors  want  to  express  their  appreciations  to  Ter¬ 
rence  Chen  (UIUC  and  Siemens  Corporate  Research),  Kevin  Vixie  (Los  Alamos  Na¬ 
tional  Lab),  Bill  Allard  (Duke),  Selim  Esedoglu  (U.Mich),  Tony  Chan  (UCLA),  and 
Otmar  Scherzer  (U. Innsbruck).  Dr.  Chen’s  early  broad  and  successful  applications  of 
the  TV-L^  model  encouraged  us  to  explorer  the  model.  Dr.  Vixie’s,  Prof.  Esedoglu’s, 
and  Prof.  Allard’s  on-going  work  and  many  discussions  with  the  first  author  have 
given  us  insight  into  the  geometric  properties  of  the  model.  Prof.  Chan  and  Prof. 
Esedoglu  introduced  the  TV-L^  model  to  us,  and  their  work  [17]  is  the  foundation 
of  this  paper.  Our  communications  with  Prof.  Scherzer  inspired  us  to  associate  the 
G-value  with  scale-based  feature  selection.  Finally,  we  thank  an  anonymous  referee 
for  informing  us  of  related  work  and  helping  us  improve  this  paper. 


20 


YIN,  GOLDFARB,  AND  OSHER 


Fig.  8.2.  TV-L^  decomposition  outputs  (ui  and  vi  were  obtained  using  Xi). 
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Appendix  A. 

Proposition  A.l  (Annulus  input).  Let  the  observed  image  he  the  characteristic 
function  /  =  r-aly)’  0  <  r2  <  ri,  which  has  intensity  c>  Q  at  all  points  in  the 

annulus  Ar^^r2{y)  •=  {{xi-,X2)  '■  <  {xi  —  yiY  +  {x2  —  1/2)^  <  i’ll  (ind  intensity  0 

everywhere  else.  Let  u*  denote  the  minimizer  of  the  TV-L^  energy  E(w;  A, /). 

1.  Ifr2<'^,  then 


2.  Ifr2 


0, 


ul={ 


PClB,^(y),  Vp  G  [0,  1], 

PClB^^(y)  +  (l-p)/,  VpG[0,l], 


then 


*/A=^, 
r{-2ri  <  A  < 


2^ 
r2  ’ 


r’ 

=  S  PcIb,^  (y)  +  p/,  Vp,  p  G  [0, 1]  9  p  +  P  <  1, 


*/A< 
^/A  = 
^/A> 


2 

ri-r2  ’ 
2 

ri— r2  ’ 
2 

ri-r2  ’ 


5.  Lf  ^  <  r2,  then 


fo, 

<  =  <  Pf,  VpG  [0,1], 

l/> 


*/A< 
*/A  = 
*/A> 


2 

ri-r2  ’ 
2 

ri—r2  ’ 
2 

ri-r2  ’ 


Proof.  L{f,fi)  =  Ari^r2{y)  for  p  G  (0,1).  It  follows  from  u*  =  argmin„E(u;  A, /) 
and  the  Layer  Cake  formula  (2.1)  that  :=  L(u*,pf)  must  minimize 


(A.l) 


Per  (C/^)  +  A|[/^AAri,r2(2/)|. 


We  first  show  there  exists  a  minimizer  of  (A.l)  that  is  rotationally  symmetric.  Let 
U  denote  an  arbitrary  minimizer  of  (A.l)  and  suppose  that  U  is  not  rotationally 
symmetric.  Let  U{a)  denote  another  minimizer  of  (A.l)  obtained  by  rotating  U 
clockwise  around  y  for  a  radians.  It  follows  from  the  Layer  Cake  formula  (2.1)  that 

1  /•2^ 

minimizes  E(m;A, /).  Therefore,  U  :=  {x  :  ti  >  0}  is  a  minimizer  of  (A.l)  that  is 
rotationally  symmetric.  Using  this  result  we  can  narrow  the  set  of  solutions  to  the 
empty  set  and  rotationally  symmetric  (about  y)  sets.  We  now  briefly  outline  the  rest 
of  the  proof  and  leave  the  details  to  the  reader. 

First,  we  further  limit  the  search  for  a  solution  to  the  empty  set  and  rotation- 
ally  symmetric  sets  with  a  single  connected  component,  and  this  gives  a  exactly 
as  the  L(u'^,0).  Then,  allowing  U  to  have  more  than  one  connected  component,  say 
U  =  UiU  . .  .U  Un,  and  using  the  fact  that  minimizing  (A.l)  is  equivalent  to 


ininPer([/)  -L  AjU  \  Ar.,^r2{y)\  -  AjU  C  Ar^^r2{y)\ 

n 

=  mjn^  (Per([/,)  -L \\U,  \  Arj_,.,(j/)|  -  \\Ui  C  Ar^^r2(.y)\) , 
2=1 
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we  conclude  that  each  Ui  must  minimize  (A.l)  and  hence  equal  to  U^.  Therefore, 
is  a  minimizer  of  the  geometry  problem  (A.l),  and  the  proposition  follows  from 
Theorem  4.1.  □ 
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